Skip to content

starknet_transaction_prover: /health returns 503 when service is saturated#14171

Open
avi-starkware wants to merge 1 commit into
avi/prover-v3/panic-counterfrom
avi/prover-v3/saturation-health
Open

starknet_transaction_prover: /health returns 503 when service is saturated#14171
avi-starkware wants to merge 1 commit into
avi/prover-v3/panic-counterfrom
avi/prover-v3/saturation-health

Conversation

@avi-starkware
Copy link
Copy Markdown
Collaborator

Adds SaturationMonitor (shared by ProvingRpcServerImpl and
HealthLayer) that tracks whether the concurrency semaphore has been
continuously rejecting proving requests. Once that has held for the
configured window (health_max_saturated_ms, default 10s), /health
returns 503 with an opaque body so load balancers can drain the pod
before in-flight requests start failing.

Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com

@cursor
Copy link
Copy Markdown

cursor Bot commented May 24, 2026

PR Summary

Medium Risk
Changes load-balancer health behavior for the prover; mis-tuned health_max_saturated_ms could drain pods too early or too late, but scope is limited to health/concurrency signaling with tests.

Overview
Introduces SaturationMonitor shared between the proving RPC handler and HealthLayer: concurrency rejections start a timer; a successful permit acquire clears it. After health_max_saturated_ms (default 10s, CLI/env HEALTH_MAX_SATURATED_MS) of continuous rejections, GET /health returns 503 with an opaque {"status":"unhealthy","reason":"saturated"} body so load balancers can drain the pod; recovery returns 200.

Wiring: main builds the monitor and a configured HealthLayer, passes both into ProvingRpcServerImpl and start_server / TLS (middleware macro now takes an explicit health_layer instead of a default HealthLayer).

Reviewed by Cursor Bugbot for commit b321f22. Bugbot is set up for automated code reviews on this repo. Configure here.

@reviewable-StarkWare
Copy link
Copy Markdown

This change is Reviewable

@avi-starkware avi-starkware force-pushed the avi/prover-v3/panic-counter branch from cbd1def to e503ebd Compare May 24, 2026 16:48
@avi-starkware avi-starkware force-pushed the avi/prover-v3/saturation-health branch from 318c9c2 to 53381dd Compare May 24, 2026 16:48
@avi-starkware avi-starkware force-pushed the avi/prover-v3/panic-counter branch from e503ebd to db503b7 Compare May 26, 2026 08:43
@avi-starkware avi-starkware force-pushed the avi/prover-v3/saturation-health branch 2 times, most recently from d477f5e to ef3cf0b Compare May 26, 2026 12:16
@avi-starkware avi-starkware force-pushed the avi/prover-v3/panic-counter branch from 1da27e9 to ac98d86 Compare May 26, 2026 12:17
@avi-starkware avi-starkware force-pushed the avi/prover-v3/saturation-health branch from ef3cf0b to eb8da8d Compare May 26, 2026 12:17
@avi-starkware avi-starkware force-pushed the avi/prover-v3/panic-counter branch from ac98d86 to e4bbbdc Compare May 26, 2026 12:58
@avi-starkware avi-starkware force-pushed the avi/prover-v3/saturation-health branch from eb8da8d to e084131 Compare May 26, 2026 12:58
@avi-starkware avi-starkware force-pushed the avi/prover-v3/panic-counter branch from e4bbbdc to 06bb59e Compare May 26, 2026 16:14
@avi-starkware avi-starkware force-pushed the avi/prover-v3/saturation-health branch 2 times, most recently from 171e482 to 158a680 Compare May 26, 2026 16:47
@avi-starkware avi-starkware force-pushed the avi/prover-v3/panic-counter branch from 06bb59e to 0b2c8cc Compare May 26, 2026 16:47
@avi-starkware avi-starkware force-pushed the avi/prover-v3/saturation-health branch from 158a680 to b385d86 Compare May 26, 2026 16:59
@avi-starkware avi-starkware force-pushed the avi/prover-v3/panic-counter branch from 0b2c8cc to 4b1caba Compare May 26, 2026 16:59
@avi-starkware avi-starkware force-pushed the avi/prover-v3/saturation-health branch from b385d86 to a462e96 Compare May 27, 2026 10:01
@avi-starkware avi-starkware force-pushed the avi/prover-v3/panic-counter branch from 4b1caba to 05ed9b4 Compare May 27, 2026 10:01
@avi-starkware avi-starkware force-pushed the avi/prover-v3/saturation-health branch from a462e96 to a83176f Compare May 27, 2026 10:35
@avi-starkware avi-starkware force-pushed the avi/prover-v3/panic-counter branch from 05ed9b4 to 72918b7 Compare May 27, 2026 10:35
@avi-starkware avi-starkware force-pushed the avi/prover-v3/saturation-health branch from a83176f to b4c05a6 Compare May 27, 2026 12:55
@avi-starkware avi-starkware force-pushed the avi/prover-v3/panic-counter branch 2 times, most recently from 74f4f46 to 728f22c Compare May 27, 2026 13:11
@avi-starkware avi-starkware force-pushed the avi/prover-v3/saturation-health branch from b4c05a6 to 2739271 Compare May 27, 2026 13:11
@avi-starkware avi-starkware force-pushed the avi/prover-v3/panic-counter branch from 728f22c to 966f499 Compare May 27, 2026 14:04
@avi-starkware avi-starkware force-pushed the avi/prover-v3/saturation-health branch from 2739271 to 89534f1 Compare May 27, 2026 14:04
@avi-starkware avi-starkware force-pushed the avi/prover-v3/saturation-health branch from 89534f1 to d3f1139 Compare May 27, 2026 14:20
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit d3f1139. Configure here.

Comment thread crates/starknet_transaction_prover/src/server/rpc_impl.rs
@avi-starkware avi-starkware force-pushed the avi/prover-v3/panic-counter branch from 646fb1e to a77477b Compare May 31, 2026 10:23
@avi-starkware avi-starkware force-pushed the avi/prover-v3/saturation-health branch 2 times, most recently from d15dc19 to def7ea4 Compare June 1, 2026 08:17
@avi-starkware avi-starkware force-pushed the avi/prover-v3/panic-counter branch from a77477b to 8017e9e Compare June 1, 2026 08:17
…rated

Adds `SaturationMonitor` (shared by `ProvingRpcServerImpl` and
`HealthLayer`) that tracks whether the concurrency semaphore has been
continuously rejecting proving requests. Once that has held for the
configured window (`health_max_saturated_ms`, default 10s), `/health`
returns 503 with an opaque body so load balancers can drain the pod
before in-flight requests start failing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants